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Abstract 

We consider mean squared estimation with lookahead of a continuous-time signal corrupted by additive white 
Gaussian noise. We show that the mutual information rate function, i.e., the mutual information rate as function 
of the signal-to-noise ratio (SNR), does not, in general, determine the minimum mean squared error (MMSE) with 
fixed finite lookahead, in contrast to the special cases with and infinite lookahead (filtering and smoothing errors), 
respectively, which were previously established in the literature. We also establish a new expectation identity under a 
generalized observation model where the Gaussian channel has an SNR jump at t — 0, capturing the tradeoff between 
lookahead and SNR. 

Further, we study the class of continuous-time stationary Gauss-Markov processes (Ornstein-Uhlenbeck processes) 
as channel inputs, and explicitly characterize the behavior of the minimum mean squared error (MMSE) with finite 
lookahead and signal-to-noise ratio (SNR). The MMSE with lookahead is shown to converge exponentially rapidly 
to the non-causal error, with the exponent being the reciprocal of the non-causal error. We extend our results to 
mixtures of Ornstein-Uhlenbeck processes, and use the insight gained to present lower and upper bounds on the 
MMSE with lookahead for a class of stationary Gaussian input processes, whose spectrum can be expressed as a 
mixture of Ornstein-Uhlenbeck spectra. 

Index Terms 

Mutual information, mean squared error (MSE), Brownian motion, Gaussian channel, additive white Gaussian 
noise (AWGN), causal/filtering error, non-causal/smoothing error, lookahead/delay, Ornstein-Uhlenbeck process, 
stationary process, power spectral density, signal-to-noise ratio (SNR) 

I. Introduction 

Mean squared estimation of a signal in the presence of Gaussian noise has been a topic of considerable importance 
to the communities of information theory and estimation theory. There have further been discoveries tying the two 
fields together, through identities between fundamental informational quantities and the squared estimation loss. 

Consider the continuous-time Gaussian channel. In (TJ, Duncan established the equivalence between the input- 
output mutual information and the integral of half the causal mean squared error in estimating the signal based on 
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the observed process. In (2 J, Guo et al. present what is now known as the I-MMSE relationship, which equates 
the derivative of the mutual information rate to half the average non-causal squared error. These results were 
extended to incorporate mismatch at the decoder for the continuous-time Gaussian channel by Weissman in |3|, 
building upon similar relations for the scalar Gaussian channel by Verdu in Q. In (3), it was shown that when 
the decoder assumes an incorrect law for the input process, the difference in mismatched filtering loss is simply 
given by the relative entropy between the true and incorrect output distributions. Recently, in (SJ, a unified view 
of the aforementioned identities was presented. In particular, pointwise analogues of these and related results were 
developed, characterizing the original relationships as identities involving expectations of random objects with 
information-estimation significance of their own. 

In this work, we shall focus on stationary continuous -time signals corrupted by Gaussian noise. Let X = {X t ,t £ 
M.} denote a continuous-time signal, which is the channel input process. We assume square integrability of the 
stochastic process X, which says that for any finite interval [a, b] C M, E[J^ X 2 dt] < oo. 

The output process Y of the continuous-time Gaussian channel with input X is given by the following model 

dY t = y/snr X t dt + dW u (1) 

for all t, where snr > is the channel signal-to-noise ratio (SNR) and W. is a standard Brownian Motion (cf. |6| 
for definition and properties of Brownian motion) independent of X. For simplicity let us assume the input process 
to be stationary. Throughout, we will denote X% — {X t : a < t < b}. 
Let /(snr) denote the mutual information rate, given by 

7(snr)= lim ^I(X^;Y T ). (2) 

Let mmse(snr) denote the smoothing squared error 

i r T 

mmse(snr) = lim - / E[(X t -E[X t \Y^]) 2 ]dt, (3) 

T-s-oo T J 

which, in our setting of stationarity, can equivalently be defined as 

mmse(snr) = E[(X - E[X Q \Y+™}) 2 }. (4) 
Similarly, the causal squared error is given by 

i r T 

cmmse(snr) = lim- / E[(X t - E[X t \Y*]) 2 } dt (5) 

T-s-oo T J 

= E[(X -E[Xo|X°oo]) 2 ]. (6) 

From (TJ and (2), we know that the above quantities are related according to 

2/(snr) , , 1 f sm , , , 

= cmmse(snr) = — / mmse(7) cry (7) 

snr snr J 

for all signal-to-noise ratios, snr > 0, and for all choices of the underlying input distribution^] for which the process 
X is square integrable. Thus, the mutual information, the causal error and the non-causal error are linked together 

1 Indeed, |tJ continues to hold in the absence of stationarity of the input process, in which case the respective quantities are averaged over a 
fixed, finite interval [0, T\. 
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in a distribution independent fashion for the Gaussian channel under mean-squared loss. Having understood how 
the time averaged causal and non-causal errors behave, and in particular the relation (j7]i implying that they are 
completely characterized by the mutual information rate function - the current work seeks to address the question 
- what happens for lookahead between and oo?. 

The mean squared error with finite lookahead d > is defined as 

lmmse(d,snr) = E[(X - E^ol^-J) 2 ], (8) 

where it is instructive to note that the special cases d = and d = oo in <|8j yield the filtering (|6| and smoothing 
Q errors respectively. Let us be slightly adventurous, and rewrite |7]i as 

2/fsnr) 

— ) '- = E[lmmse(0, T )] = E[lmmse(oo, r m )], (9) 

where the random variables To, Too are distributed according to Tq = snr a.s., and ~ U[0, snr], where U[a,b] 
denotes the uniform distribution on [a, b]. This characterization of the mutual information rate may lead one to 
conjecture the existence of a random variable T^, whose distribution depends only on d, snr, and some features of 
the process (such as its bandwidth and spectrum), and satisfies the identity 

2/ ( S " r ) = E[lmmse(d, T d )}, (10) 
snr 

regardless of the distribution of X. I.e., there exists a family of distributions Yd ranging from one extreme of a point 
mass at snr for d = 0, to the uniform distribution over [0, snr] when d = oo. One of the corollaries of our results 
in this work is that such a relation cannot hold in general, even when allowing the distribution of Td to depend 



on distribution-dependent features such as the process spectrum. In fact, in Section VII we show that in general, 
the mutual information rate function of a process is not sufficient to characterize the estimation error with finite 
lookahead. Therefore, one would need to consider features of the process that go beyond the mutual information 
rate and the spectrum - in order to characterize the finite lookahead estimation error. Motivated by this question, 



however, in Section VI we establish an identity which relates the filtering error to a double integral of the mean 
squared error, over lookahead and SNR, for the Gaussian channel with an SNR jump at t = 0. 

In the literature, finite lookahead estimation has been referred to as fixed-lag smoothing (cf., for example, [7] 
and references therein). Note that ^ is well defined for all d € 1. For d < 0, lmmse(d, snr) denotes the finite 
horizon average prediction mean square error, which is as meaningful as its finite lookahead counterpart, i.e. the 
case when d > 0. Motivated by the desire to understand the tradeoff between lookahead and mmse, in Section |H| we 
explicitly characterize this trade-off for the canonical family of continuous-time stationary Gauss-Markov (Ornstein- 
Uhlenbeck) processes. In particular, we find that the rate of convergence of lmmse( , snr) with increasing lookahead 
(from causal to the non-causal error) is exponential, with the exponent given by the inverse of the non-causal error, 
i.e. -, — r itself. 

mmse(snr) 

The rest of the paper is organized as follows. In Section|II]we explicitly characterize the lookahead-MSE trade-off 
for the canonical family of continuous-time stationary Gauss-Markov (Ornstein-Uhlenbeck) processes. In particular, 
we find that the convergence of lmmse(-,snr) with increasing lookahead (from causal to the non-causal error) is 
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exponentially fast, with the exponent given by the inverse of the non-causal error, i.e. — T itself. In Section III 

we extend our results to a larger class of processes, that are expressible as a mixture of Ornstein-Uhlenbeck processes. 
We then consider stationary Gaussian input processes in Section |IV| and characterize the MMSE with lookahead 
via spectral factorization methods. In Section [V] we consider the information utility of infinitesimal lookahead for a 



general stationary input process and relate it to the squared filtering error. In Section VI we introduce a generalized 
observation model for a stationary continuous-time signal where the channel has a non-stationary SNR jump at 
t = 0. For this model, we establish an identity relating the squared error with finite lookahead and the causal 



filtering error. In Section VII we show by means of an example that a distribution-independent identity of the form 



in ( 10 1 cannot hold in general. We conclude with a summary of our findings in Section VIII 

II. Ornstein-Uhlenbeck Process 

A. Definitions and Properties 

The Ornstein-Uhlenbeck process IS] is a classical continuous-time stochastic process characterized by the fol- 
lowing stochastic differential equation 

dX t = a(fi - X t )dt + pdB u (11) 

where {-Bf}t>o is a Standard Brownian Motion and a, (j,, j3 are process parameters. The reader is referred to |9] 
for a historical perspective on this process. Below, we present some of its important statistical properties. The mean 
and covariance functions are given by: 

E(X t ) = n, (12) 

and, 

Cov(X u X s ) = ^e- a ^. (13) 
Za 

We can further denote the autocorrelation function and power spectral density of this process by 

R X {r) = %- e -°\r\ (14) 
Za 

S x (u>) = 2 . (15) 

or + u z 

In all further analysis, we consider the process mean /i to be 0. We also note that all expressions are provided 
assuming a > which results in a mean-reverting evolution of the process. 

B. Mean Squared Estimation Error 

Recall the mean squared error with finite lookahead d defined in ([SJ from which one can infer the filtering ([6]) 
and smoothing Q errors respectively. We now compute these quantities for the Ornstein-Uhlenbeck process at a 
fixed signal to noise ratio 7. Define, for the Ornstein-Uhlenbeck process corrupted by the Gaussian channel ([lj, 
the error in estimating Xq based on a finite lookahead d into the future and a certain lookback I into the past: 

v{l,d,i) =Var(X |^ d ;) d,l>0. (16) 
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Before we explicitly characterize v(l,d, 7), we present the following intermediate result, which is proved in 
Appendix |A| 

Lemma 1: Let e d = v(Q, d, 7) = 0,7^ Then, 



-7/3 2 



(17) 



where 



£±i and 



7flx(0) + v / a 2 +7^ 2 +a 
7 iix(0)-Va 2 +7/3 2 +c« 



Using Lemma [T] we now compute the more general quantity v(l,d,j) defined in (16 1, which denotes the loss in 
estimating X t based on the observations Y'_j"i, for a stationary process. 

Lemma 2 (estimation error with finite window of observations in the past and future): 

eie d R x (0) 



v(l,d,^) 



(18) 



Rx(0)(ei +e d )- e t e d 

The proof of Lemma [2] is straightforward, upon noting (i) the Markov Chain relationship Y° ; — Xq — Yq, and (ii) 
the joint Gaussianity of (Xo, Y^). Thus, one can use the conditional variances Var(Xo|Y° ; ) and WaiiX^) to 
obtain the required expression for Var(Xo|Yf ; ). 

Using Lemma [T] and Lemma [2] we explicitly characterize the filtering (|6]l and smoothing (|4]) (mean squared error) 
MSE's for the noise corrupted Ornstein-Uhlenbeck process, in the following Lemma. 

Lemma 3 (MSE for the Ornstein-Ulhenbeck process): For the Ornstein-Ulhenbeck process corrupted by white 
gaussian noise at signal-to-noise ratio snr, the filtering and smoothing errors are given by: 

• Filtering Error 



Smoothing Error 



cmmse(snr) = 2/(00, 0, snr) = 



mmse(snr) = 1/(00, 00, snr) 



snr/3 2 — a 



snr 



2 V 'a 2 + snr/3 2 



(19) 



(20) 



The expressions in (19i and (20 1 recover the classical expressions for the optimal causal and smoothing errors 



(for Gaussian inputs), due to Yovits and Jackson in JT2), and Wiener in 1 13 1, respectively. 



Remark: (19 1 and (20i can be easily seen to verify the following general relationship between the filtering and 



smoothing error for the continuous time Gaussian channel, established in (2j, i.e. 

cmmse(snr) = — 
snr 

C. Estimation with Lookahead 



mmse(7) dry. 



(21) 



Having introduced the Ornstein-Ulhenbeck process and its properties, we now study the behavior of the MMSE 
with finite lookahead. 



Note: For the rest of this paper we will set /3 = 1 in ( 11 1, as it only involves an effective scaling of the channel 
SNR parameter. 



2 where the latter equality holds due to time-reversibility of the Ornstein-Uhlenbeck process 
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Recall that, 

lmmse(d, 7) = Var(X |r d oo ) (22) 
= 1/(00, d, 7), (23) 

where (23 1 follows from the definition of -,7) in Lemma [2] The following lemma explicitly characterizes the 
finite lookahead MMSE for the Ornstein-Ulhenbeck process with parameter a, corrupted by the Gaussian channel 
at signal-to-noise ratio 7. 

Lemma 4 (MMSE with Lookahead for OU(a)): 

{(1 - e~ 2 V« 2 +T) mmse (7) + e -2<Va 2 +7 cmmse ( 7 ) if d > 
(24) 
e -2«|d| cmmse ( 7 ) + ±.(1- e -' 2a ^) if d < 0. 

Proof: We will establish the expression for positive and negative values lookahead separately. 
For d > 0: We combine (22 1 and Lemma [2] to obtain the closed form expression for lmmse(d, 7). In addition, 
from Lemma |3] we note that 



\f op- -(-7 — a 

cmmse(7) = , (25) 

7 

mmse(7) = — - (26) 
2yja 2 +7 

Using the above expressions, we can express the estimation error with finite lookahead d > in the following 
succinct manner 

lmmse(d,7) = (1 - e- 2d ^+7) mmse (7) + e - 2< V Q2 +7 cmmse ( 7 ). (27) 
For d < 0: We denote the absolute value of d as \d\, and note that 

YZ^ ~ - X (28) 

forms a Markov triplet. We further note that (X , X_u\ , Y_^) are jointly Gaussian. In particular, we know that 

X \X_ ld] ~ M\X_ ld] e-<*W, ^(1 - e- 2a ^) j , (29) 

Vart-Xlidilicj?) = cmmse( 7 ) (30) 

From the above relations, it is easy to arrive at the required quantity Var(X |iCj^') for d < 0, 

lmmse(d, 7 ) = e~ 2Q|d| cmmse(7) + — (1 - e - 2Q|d| ). (31) 

2a 

This completes the proof. ■ 
A plot of the estimation error with finite lookahead for the Ornstein-Uhlenbeck process is shown in Fig. [T] It is 
seen from the expression in Lemma |4] that the asymptotes at d — —00 and d = +00 correspond to the values 
Rx(0) and mmse(snr), respectively, for a fixed SNR level snr. 
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Fig. 1: lmmse(d, snr) vs. lookahead d for the Ornstein-Uhlenbeck process for a = 0.5, and snr = 1. 



We now focus on the case where d > 0, which corresponds to estimation with finite delay. We define, for d > 0, 

. lmmse(d,7) -mmse(7) 

Pd = jr-r rT -. (32) 

cmmse(7j — mmse(7j 

From Lemma |4] we observe that for the Ornstein-Ulhenbeck process, 

Pd = e - 2d \/^ (33) 
= e _;i =M, (34) 



where (34i follows from (26 1. In other words, the causal error approaches the non-causal error exponentially fast 
with increasing lookahead. This conveys the importance of lookahead in signal estimation with finite delay. We can 
state this observation as follows: 

Observation 5: For any AWGN corrupted Ornstein-Ulhenbeck process, the mean squared error with finite positive 
lookahead, approaches the non-causal error exponentially fast, with decay exponent given by mm ^,^ ■ 

D. SNR vs Lookahead tradeoff 

One way to quantify the utility of lookahead in finite delay estimation is to compute the corresponding gain 
in SNR. Specifically, we compute the required SNR level which gives the same mmse with lookahead d as the 
filtering error at a fixed SNR level snr > 0. 



s 



Let 72(snr) be the value of 'signal to noise ratio' that provides the same mean square error as the causal filter 
with zero lookahead. I.e. 

lmmse(d, 7^(snr)) =cmmse(snr), (35) 

whenever a solution 7^ (snr) > exists. 

We now present some general observations about 7^ (snr). 

• 7*; (snr) is a monotonically decreasing function of d. 

• 7q (snr) = snr. 

• Let 7oo(snr) = lim^oo 7d( snr )- Then, we have mmse(7 00 (snr)) = cmmse(snr). For the Ornstein-Uhlenbeck 
process parameterized by a, we get the following closed form expression for 7^ (snr): 

^ (2cmmse(snr)) 2 ^ ^ 

• 7^ (snr) has a vertical asymptote at d = d* m . < 0, where d* m . is defined as the solution to the equation 
Va^XoIX^) = cmmse(snr). Then we have, 

lim 72(snr) = 00 (37) 

In Fig. [2] we illustrate the behavior of 7,2 (snr) as a function of lookahead, for the Ornstein-Uhlenbeck process 
corrupted by Gaussian noise according to the channel in ([TJ. 

III. A mixture of Ornstein-Uhlenbeck processes 

Having presented the finite lookahead MMSE for the Ornstein-Ulhenbeck process in Lemma [4] in this section 
we obtain the MMSE with lookahead for the class of stochastic processes that are mixtures of Ornstein-Ulhenbeck 
processes. We then proceed to establish a general lower bound on the MMSE with lookahead for the class Gaussian 
processes whose spectra can be decomposed as a mixture of spectra of Ornstein-Ulhenbeck processes. For the 
same class of Gaussian processes, we also present an upper bound for the finite lookahead MMSE, in terms of a 
mismatched estimation loss. 

Let 11(a) be a probability measure defined on [0, 00). Let p( a ) be the law of the Ornstein-Uhlenbeck process 
with parameter a. Note that each p(°0 is the law of a stationary ergodic stochastic process. We define the stationary 
distribution generated by taking a /i-mixture of these processes: 

P = J P {a) dfi(a). (38) 

Note that P need not be Gaussian in general. 

Lemma 6: Let X be a stationary stochastic process governed by law P which is a [i mixture of Ornstein- 
Ulhenbeck processes, and is corrupted by AWGN at SNR 7. The MMSE with finite lookahead d is given by 

lmmsep(o?, 7) = J lmmse Q (<i, 7) dfj,(a), (39) 
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SNR vs Lookahead trade-off curve for OU(a) process 
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lookahead d 
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Fig. 2: Plot of 7^(snr) (snr = 1) as a function of d for the Ornstein Uhlenbeck process with parameter a = 0.2. In 
this case 7oo = 0.3320 and d* = -0.9935. 



where lmmse a (d, 7) is (as defined in Lemma |4ji the corresponding quantity for estimating an Ornstein-Ulhenbeck 
process with parameter a. 

The proof of Lemma [6] follows in one line, upon observing that the underlying 'active mode' is eventually precisely 
learned from the infinitely long observation of the noisy process. This relation allows us to compute the minimum 
mean squared error with finite lookahead for the large class of processes that can be expressed as mixtures of 
Ornstein-Uhlenbeck processes. 

As another important corollary of this discussion, consider any Gaussian process G whose spectrum Sq can be 
expressed as a mixture of spectra of Ornstein-Uhlenbeck processes, for some appropriate mixing measure /i(a),i.e. 

S G = [ S a d l x{a), (40) 

where S a denotes the spectrum of the Ornstein-Ulhenbeck process with parameter a. The approach outlined above 
provides us with a computable lower bound on the minimum mean squared error with fixed lookahead d (under 
AWGN) for the process G, which we state in the following Lemma. 



Lemma 7: For a Gaussian process G with spectrum as in (40i, the finite lookahead mmse has the following 
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lower bound: 

lmmseG(d, 7) > J lmmse Q (d, 7) d(i(a), (41) 
where lmmse Q (d, 7) is characterized explicitly in Lemma |4] 

To see why ( pT| ) holds note that its right hand side represents the mmse at lookahead d associated with the process 



whose law is expressed in (38 1, while the left side corresponds to this mmse under a Gaussian source with the 
same spectrum. 



In the following example, we will illustrate the bound in (41 1. In Section IV- A we discuss the computation of 
the mmse with lookahead for any stationary Gaussian process, of a known spectrum. This computation, based on 
Wiener-Kolmogorov theory, relies on the spectral factorization of the spectrum of the AWGN corrupted process. 



This factorization is, in general, difficult to perform. Thus, a bound on the mmse with lookahead, such as in (41 
for a Gaussian process whose spectrum can be expressed as a mixture of spectra of Ornstein-Ulhenbeck processes, 
is quite useful. 

A natural question that emerges from the above discussion is, what functions can be decomposed into mixtures 
of spectra of Ornstein-Ulhenbeck processes ? To answer this question, note that one can arrive at any spectrum Sq 
which can be expressed (upto a multiplicative constant) as 

J a z + uj z 



where we use (15i to characterize S a (uj). Equivalently, in the time domain, the desired auto-correlation function 
is expressible as 

MT)=r.-*^, (43) 

Jo a 



which can be viewed as a real -exponential transform of the function /i. However, as can be seen from (42 1, the 
spectrum Sg{u) is always constrained to be monotonically decreasing with uj. This shows that the space of functions 
that can be candidates for the spectrum of the process G, is not exhausted by the class of functions decomposable 
as spectra of Ornstein-Ulhenbeck processes. 

A. Illustration of bound in Lemma [7] 

We consider the simple scenario of a process which is the equal mixture of two Ornstein-Ulhenbeck processes, 
Xi and X2, parametrized by a% and «2 respectively. Specifically, let us define a stationary continous-time process 
X as 

f Xi ifw.p.i 

X = <^ " (44) 

I X 2 if w.p.| 

Note that the process X will not be Gaussian in general. The spectrum <Sx( ) of the mixture, will be given by the 
mixture of the spectra of the constituent Ornstein-Ulhenbeck processes. I.e. 

Sx(-) = ^ Xl (-) + ^x 2 (-), (45) 
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where 5x i5 denotes the spectrum of Xj, i = 1,2. Now consider a stationary Gaussian process G with the same 
spectrum, i.e. 

Sg(-) = Sx(-). (46) 
We will consider the noise corruption mechanism in ([!} at a signal-to-noise ratio snr. Note that Lemma [4j in 



conjunction with (39 1, allow us to compute the mmse with lookahead lmmsex(<i, snr), for the X process: 

lmmsex(<i, snr) = -lmmsexj (d, snr) + -lmmsex 2 (d, snr), (47) 

for all d. Note that when d = — oo, the quantity lmmse(— oo, snr) will be given by the stationary variance of the 
underlying process. Since this quantity depends only on the spectrum of the stationary process, the lower bound in 
( pTTj i will coincide with the estimation error for the G process at d = — oo. For the G process, we can analytically 
compute the mmse with finite lookahead ( |1 16| l as well, using the method outlined in Section |IV-A| The reader 
is referred to Appendix [B] for a detailed discussion on computing the estimation error with finite lookahead for 



the G process under any a\,oi2- For the purposes of this example, we illustrate the bound (41 1 for the case 
when a.\ = 0.75 and = 0.25. In Fig. [3] we present the mmse with lookahead for process G, as well as 
the lower bound, given by the corresponding quantity for X. For the given choice of parameters, at snr = 1: 
lmmsex(— oo, snr) = lmmseG;(— oo, snr) = 1.333; lmmsex(+oo, snr) = 0.4425; lmmseG(+oo, snr) = 0.4568. 



B. Towards deriving an Upper Bound 

Note that Lemma [7J gives us a computable lower bound for the mmse with lookahead for a Gaussian process G 
whose spectrum can be expressed as a mixture of spectra of Ornstein-Ulhenbeck processes, under an appropriate 
mixture p. 

Define, the mismatched mmse with lookahead for a filter that assumes the underlying process has law 
[Ornstein-Ulhenbeck process with parameter /3], whereas the true law of the process is 

lmmse Qi/3 (d,j) = E a [(X - E^F^]) 2 ], (48) 

where the outer expectation is with respect to the true law of the underlying process P^ a \ while the inner expectation 
is with respect to the mismatched law P^. Note that the mismatched filter which assumes that the underlying 
signal is an Ornstein-Ulhenbeck process with parameter /?, is in particular, a linear filter. The process G is Gaussian, 
and hence the optimal filter for G is also linear. Thus, for any fixed j3, the mismatched filter thus defined, will be 
suboptimal for process G. This yields the following natural upper bound, which in conjunction with Lemma [JJ can 
be stated as: 

Lemma 8 (Upper and lower bound in terms of mixture of Ornstein-Ulhenbeck spectra): 

J lmmse Q (d, 7) dp(a) < lmmsec (d, 7) < min J \mmse a p(d,j)dp(a). (49) 

We should note that the upper bound stated in Lemma [8] can be hard to compute for continuous time processes, 
even though the filter applied is linear. In summary, analysis of lmmse(-,7) for the Ornstein-Uhlenbeck process 
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Fig. 3: Illustration of the bound in 
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Plot of MMSE with lookahead for the processes G and X in Example 



III-A for snr = 1. X is a mixture of two Ornstein-Ulhenbeck processes with parameters a\ = 0.75 and a?2 = 0.25 
respectively. G is a stationary Gaussian process with the same spectrum as X. 



provides us with a rich class of processes for which we can compute the finite lookahead estimation error. And for 



a class of Gaussian processes (43 1, we have a lower and upper bound on this quantity. These characterizations may 
be used to approximate the behavior of the MMSE with finite lookahead for a large class of Gaussian processes, 
instead of obtaining the optimal solution which relies on spectral factorization, and can be cumbersome. 

IV. Stationary Gaussian Processes 

In this section, we focus exclusively on Gaussian processes having a known spectrum. We are interested in 
obtaining the MMSE with finite lookahead in estimating such a process when corrupted by AWGN at a fixed SNR 
level 7. 

A. Calculation of MMSE with finite lookahead via Spectral Factorization 

Let {Xf, f 6 i} stationary and Gaussian, with power spectral density Sx(u), be the input to the continuous-time 
Gaussian channel at SNR 7. Y(.j, the noise corrupted version of X/a, is stationary, Gaussian and has PSD Sy (cj) = 
1 + 7»S'x(w). Let Sy(oj) be the Wiener-Hopf factorization of 5y(w), i.e. a function satisfying 1 ^(w)] 2 = Sy(w) 
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with 1/Sy(u)) being the transfer function of a stable causal filter. This factorization exists whenever the Paley- 
Wiener conditions are satisfied for SV(u;). Denote by Y t the output of the filter 1/Sy(u>) applied on Y t , then Y t 
is a standard Brownian motion. This theory is well developed and the reader is encouraged to peruse any standard 
reference in continuous-time estimation for details (cf., e.g., flO) and references therein ). 
Let hit) be the impulse response of the filter with transfer function 



(50) 



Sy(w) 

with Sy(lj) = Sy(— lo). The classical result by Wiener and Hopf |TTJ, applied to non-causal filtering of stationary 
Gaussian signals can be stated as, 



E[Xo|y^] = E 



X Q \Y1 



oc 



h(~t)dY t . 



(51) 



Moreover, an expression for the finite-lookahead MMSE estimator of X can be immediately derived, using the 
fact that Yt is both a standard Brownian motion and a reversible causal function of the observations: 



E 



E 



Y" 



h(-t)dY t \Y* 



h(-t)dY t . 



(52) 
(53) 



Using again the fact that Y t is a brownian motion, as well as the orthogonality property of MMSE estimators, 
we can obtain a simple expression for lmmse(ei, 7): 



lmmse(d, 7 ) = E (X - E [XqIY^] ) = E (^X - E 

2 



X a \Yl 



E Xn - E 



= mmse(7) 



x \y: 

E 



EE 



X \Y1 



-E 



h(—t)dY t ) = mmse(7) 



X \Y° 

-d 



h 2 {t)dt. 



(54) 
(55) 

(56) 



This classical formulation of the lookahead problem for stationary Gaussian processes shows us that the MMSE 
with lookahead behaves gracefully with the lookahead d, and is intimately connected to the impulse response h(-) 
of the filter induced by the Wiener spectral factorization process. In particular, that the solution to the lookahead 



problem for each value of d, can be approached in this single unified manner as shown in (56i, is quite satisfying. 



B. Processes with a rational spectrum 

Let Sx(u) be a rational function, i.e. of the form with P, Q being finite order polynomials in uj. In this 
case SyiiS) is also a rational function, and the Wiener-Hopf factorization can be preformed simply by factorizing 
the numerator and denominator of Sy[lo) and composing Sy(uj) only from the stable zeros and poles. 

Recall the definition of pd in ( [32] ), 

lmmse(<i, 7) — mmse(7) 

Pd = ^ (57) 

cmmse(7) — mmse(7) 
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Clearly, p Q = 1 and lim^oo pd = 0. The value goes to zero in the same rate as lmmse converges to mmse. For 
the Ornstein-Ulhenbeck process we observed that pd converges exponentially to zero. In the following lemma, we 
generalize this result to include all Gaussian input processes with rational spectra 

Lemma 9: For any Gaussian input process with a rational spectrum, lmmse(<i, snr) approaches mmse(snr) expo- 
nentially fast as e?— s-oo. Equivalently, pd — > exponentially, as d — > oo. 



Proof: Since Sy(w) is a rational function, so is H(uj) defined in (50 1. Therefore, hit) must be a finite sum 
of exponentially decreasing functions [should use a ref here] and consequently J_^ a h 2 (t)dt must also decrease 
exponentially as d— >oo. This in conjunction with the relation in ( |56] > concludes the proof. ■ 
As an illustration, we consider an equal mixture of two Ornstein-Ulhenbeck processes with parameters a\ and 
a2, as in the example of the previous section. In Appendix [B] the mmse with lookahead is explicitly computed for 
a Gaussian process with this spectrum, acting as input to the channel. From these calculations, one can observe 



that the corresponding expression for pd in (119i vanishes exponentially with increasing lookahead. However, it is 



important to note that there exist input spectra for which the decay of pd is not exponentially fast. 
As an example, consider the input power spectrum 

S^M = (1-M)1 {H < 1} , (58) 

and In Figure Q we plot the behaviour of pd as a function of d for input power spectrum S , ^ ang (w) and different 
SNR's. This demonstrates the polynomial rate of convergence of the finite lookahead estimation error towards to 
non-causal MMSE. The values of lmmse(c?, snr) were found by numerically approximating h(t), using FFT in order 
to perform the factorization of 5y(w). 

V. Information Utility Of Lookahead 

Consider a stationary process X t observed through the continuous-time Gaussian channel (JTJ at SNR level 7. 
In the previous sections we have tried to ascertain the benefit of lookahead in mean squared estimation. We now 
address the question, how much information does lookahead provide in general, and whether this quantity has any 
interesting connections with classical estimation theoretic objects. 

For t > 0, let us define the Information Utility {/(•) as a function of lookahead r, to be 

U(r) = I(X ;Yf (59) 

When the input process is Gaussian, 

U(r) = HXolY^) - /i(X |nj = \ log (27reVar(X |y° oo )) - \ log (27reVar(X |3^ oo )) (60) 

1 , / cmmse(snr) \ 

= ;>g 1 r^v • (61) 

2 \ lmmse (t, snr) / 
Rearranging the terms, we have 

lmmse(r, snr) = cmmse(snr) exp(— 2U(t)). (62) 
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Fig. 4: Plot of pd as defined in (32 1, vs. lookahead in various SNRs, for input power spectrum (1 — \cu\) l{i w i<i}. 



The finite lookahead MMSE is seen to converge to the non-causal MMSE at an approximately cubic rate. 



Furthermore, when the input is non-Gaussian but ^X^YZ^) is well-defined for every t > 0, 

< - log (27relmmse(r, snr)) . (63) 

The first inequality is due to the fact that the Gaussian distribution has maximum entropy under a variance constraint, 
and the second inequality follows from Jensen's inequality. Rearranging the terms once more, we have that for every 
stationary input process, 

lmmse(r, snr^TVpfolF ^) exp(-2E/(Y)), (64) 

where N{Z) — exp(2h(Z)) is the entropy power functional. 

We now present a relation between [/(■) and the MMSE via a differential equation. Consider an infinitesimal 
increment dr in C/(r), 

U(t + dr) - U(r) = I(X ; Y^Y^) = I(X; +dT ; Y^Y^) - I(X;+ dT ; Y^^Yl^ X a ). (65) 

The last equality is due to the Markov Chain relationship Xq — (X^ + dT ,YZ 00 )~Y^ +dT . Using the time-incremental 
channel argument introduced in El Section III.D], we are able to write, 

I{X^-Y;+ dT \YZ^) = l -dr- snr.Var(X T |rr oo ) + o(dr), (66) 



h(X \Y2 



oo) < Ey: 



log (2neVar(X \YI oo =yL oo )) 
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and 



dr-snr-Var(X T |r: oo ,X ) + o{dr). 



The expression for the time derivative of the information utility function is therefore, 

snr 



U'(r) 



(Var^Y^) - Var^l^, X )) 



(67) 



(68) 



Since the input is assumed stationary, Wsr(X T \YZ 00 ) = Var(X |y° oo ) = cmmse(snr). We notice that Vai(X T | YZ^, Xq) 
is the causal MMSE in estimating X T when Xq is known. The value of U'(t) is therefore intimately connected to 
the effect of initial conditions on the causal MMSE estimation of the process. In particular, 

snr 



U'(0) 



-cmmse(snr), 



(69) 



meaning that the added value in information of an infinitesimal lookahead is proportional to the causal MMSE. 



Noticing that U (0) = 0, we may integrate ( |68) to obtain, 



snr 

~2 



rcmmse(snr) — / Var(X s |Y!f 0o , Xq) ds 
Jo 



Specializing to stationary Gaussian input processes, and joining equations ( |62] i and (|70]l, we obtain 
lmmse(r, snr) = cmmse(snr) exp 



-snr ^rcmmse(snr) — J Var(X s \Y_ ! l 00 , Xq) di 



(70) 



(71) 



For the Ornstein-Uhlenbeck process, Var(X s \Yl oc , Xq) can be calculated explicitly. This, in conjunction with 
( [71] >, provides an alternative derivation of the finite lookahead MMSE in the Ornstein-Uhlenbeck process, that is 
based on Information-Estimation arguments. The reader is referred to Appendix |C| for more details. 

VI. Generalized Observation Model 

In this section, we present a new observation model to understand the behavior of estimation error with lookahead. 
Consider a stationary continuous -time stochastic process X t . The observation model is still additive Gaussian noise, 
where the SNR level of the channel has a jump at t = 0. Letting Y t denote the channel output, we describe the 
channel as given below. 

s/sniX t dt + dW t t < 
JjX t dt+dWt t>0 



dY t 



(72) 



where, as usual, W. is a standard Brownian motion independent of X. Note that for 7 7^ snr, the (X t , Y t ) process 
is not jointly stationary. Letting d, I > 0, we define the finite lookahead estimation error at time d with lookahead 

I as 



/(snr, 7, <2, Z) 



Var(X d |Yl 



i+d\ 
00 / 



(73) 



We call this a generalized observation model, as for 7 = snr, we recover the usual time-invariant Gaussian channel. 
For instance, we note that the error / reduces to the filtering, smoothing or finite lookahead errors, depending on 
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the parameters 7, I and d as: 



(74) 
(75) 
(76) 



cmmse(snr) = /(snr, snr, ci, 0) 
mmse(snr) = /(snr, snr, d, 00) 
lmmse(7, snr) = /(snr, snr, £, Z) 

In the following we relate the estimation error with finite lookahead for the observation model described in f72} , 
with the original filtering error. 

Theorem 10: Let X t be any finite variance continuous time stationary process which is corrupted by the Gaussian 



channel in <(72j. Let / be as defined in ((73). For fixed snr > and T > let T ~ C/[0,snr] and L ~ U[0,T] be 
independent random variables. Then 



cmmse(snr) = E r ,i[/(snr, T, T - L, L)] 



(77) 



Proof: Before proving this result, we take a detour to consider a stationary continuous-time stochastic process 
Xq which is governed according to law P x t and is observed in the window t € [0, T] through the continuous- 
time Gaussian channel ([T| at SNR level snr. We define the operators which evaluate the filtering and smoothing 
estimation errors for this process model, as follows: 



cmmse(P X T , snr) = / E[(X t - ELY t |F*]) 2 ] dt 
'0 

mmse( " 



(Pxt, snr) = f E[(X t - E[X t \Y T }) 2 } dt 
Jo 



Further, Q gives us the celebrated relationship between the causal and non-causal estimation errors: 

1 



cmmse ffyT , snr) 



mmse (PyT , 7) dj. 



(78) 
(79) 

(80) 



We now return to the auxiliary observation model introduced above. We note that the causal and non-causal 
estimation errors of process X t at signal to noise ratio snr, can be written in terms of 



cmmse(snr) = ^E snr 



mmse(snr) = — E snr 



cmmse (P ¥ t 1 F o ,snr) 



mmse (PyT|yo ,snr) 



(81) 



(82) 



where E snr denotes expectation over Y®^. And, now employing the relation between the two quantities above, we 
get 



1. 
f 



:E sn 



cmmse fPyTiyo ,snr) 

I —00 



= ^E SM - 



1 f sm 

— / mmse (Pyr| y o ,7)^7 
,nr Jo 



(83) 



Using the definition of / and inserting in above, we get the following intergral equation which holds for every 



stationary process X t observed through the continuous Gaussian channel as described in ( |72| > 

<-snr i-T 



Y />snr />i 

/(snr, snr, x, 0) = — — J J /(snr, 7, t, T - t) dt dry, 



(84) 
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Fig. 5: Region in the Lookahead-SNR plane over which the finite lookahead MMSE given by (173) is averaged over 



in Theorem 10 to yield the filtering error at level snr. 



for all T and x. Note that the left hand side of ( 84 1 is nothing but the causal squared error at SNR level snr. Note, 



in particular, that for independent random variables T ~ [7[0,snr] and L ~ [/[0,T], ( |84[ ) can be expressed as 

cmmse(snr) = Er,z, [/(snr, T,T - L,L)]. (85) 

■ 

It is interesting to note that the double integral over lookahead and SNR are conserved in such a manner by the 
filtering error, for any arbitrary underlying stationary process. Note that one way of interpreting this result, is to 
take the average of the finite lookahead mmse (under the given observation model), over a rectangular region in 
the Lookahead vs. Signal-to-Noise Ratio plane, as depicted in Fig. [5] Theorem 10 tells us that for all underlying 



processes, this quantity is always the filtering error at level snr. Thus, we find the classical estimation theoretic 
quantity described by the causal error to emerge as a bridge between the effects of varying lookahead and the signal 
to noise ratio. 



Given Theorem 10 it is natural to investigate whether the relation (77 1 can be inverted to determine the function 
/, from the causal mmse. If that were the case, then, in particular, the filtering error would completely determine the 
estimation loss with finite lookahead. To pose an equivalent question, let us recall Duncan's result in [1] establishing 
the equivalence of the mutual information rate and the filtering error. On the other hand, by Q, p5) the mutual 
information rate function /(•), determines the non-causal mmse. Might the same mutual information rate function 
/(snr) can completely determine the estimation loss for a fixed finite lookahead d as well ? This question is 
addressed in the following section. 

VII. Can I(-) recover finite-lookahead mmse ? 

In Fig. [6] we present a characteristic plot of the mmse with lookahead for an arbitrary continuous time stationary 
process, corrupted by the Gaussian channel ([T]i at SNR level snr. In particular, we note three important features 
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Fig. 6: Characteristic behavior of the minimum mean squared error with lookahead for any process. 



of the process, namely (i) the asymptote at d = oo is mmse(snr), (ii) lmmse(0, snr) = cmmse(snr) and (iii) the 
asymptote at d = — oo is the variance of the stationary process, Var(Xo). Further, the curve is non-decreasing for 
all d. 

From [TJ and (2), we know that the mutual information, the causal, and the non-causal mmse determine each 
other according to 

2J(snr) 



1 f" 

cmmse(snr) = — / mmse (7) dj 
snr J 



(86) 



snr 

In other words, the mutual information rate function is sufficient to characterize the behavior of the causal 



and smoothing errors as functions of snr. In particular, (861 also determines the three important features of the 
lmmse(-,snr) curve discussed above, i.e. the value |^] at d = 0, and the asymptotes at d — ±00. It is natural to ask 
whether it can actually characterize the entire curve lmmse(-, snr). The following theorem implies that the answer 
is negative. 

Theorem 11: For any finite d < there exist stationary continuous-time processes which have the same mutual 
information rate /(snr) for all snr, but have different minimum mean squared errors with (negative) lookahead d. 



3 Note that Var(Xo) = cmmse(O), which /(■) determines. 
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One of the corollaries of Theorem [TT[ which we demonstrate by means of an example, is that the mutual information 
rate as a function of snr does not in general determine lmmse(d, snr) for any finite non-zero d. 

Thus, if the triangular relationship between mutual information, the filtering, and the smoothing errors is to be 
extended to accommodate finite lookahead, one will have to resort to distributional properties that go beyond mutual 
information. To see why this must be true in general, we note that the functional lmmse(-, snr) may depend on the 
time-dynamical features of the process, to which the mutual information rate function is invariant. For example, 
we note the following. 

Observation 12: Let X t be a stationary continuous time process. Define = X at for a fixed constant a > 0. 
Let Yf and Y^ a \ denote respectively, the outputs of the Gaussian channel jlj with X t and x\ a ^ as inputs. Then, 
for all d and snr > 0, 

lmmse X ( a ) (d, snr) = lmmsejf (ad, snr/a), (87) 

where the subscript makes the process under consideration explicit. Note that for the special cases when d 6 
{0, ±oo}, the error of the scaled process results in a scaling of just the SNR parameter, i.e. 

mmse X (.) (snr) = mmse^ (snr/a), (88) 

and 

cmmse x <a) (snr) = mmsex(snr/a). (89) 

For all other values of d, we see that the error depends on the error at a scaled lookahead, in addition to a scaled SNR 
level. This indicates, the general dependence of the mmse with finite, non-zero lookahead on the time-dynamics of 
the underlying process. 

One of the consequences of Duncan's result in (T] is that the causal and an ti-causaQ errors as functions of snr 
are the same (due to the mutual information acting as a bridge, which is invariant to the direction of time). Let X t 
be a continuous-time stationary stochastic process and Y t be the output process of the Gaussian channel ([T]i with 
Xt as input. We can write, 

2/(7) = cmmse(7) = acmmse(7), (90) 
or, writing the rightside equality explicitly, 

VarpTolr ^) = Var(X |r o °°). (91) 

It is now natural to wonder whether ( |9"T] i carries over to the presence of lookahead, which would have to be the case 
if the associated conditional variances are to be determined by the mutual information function, which is invariant 
to the direction of the flow of time. In the following we present an explicit construction of a process for which 

Var^olF^) ? Var(X |r-) (92) 



4 the anti-causal error denoted as acmmse(snr) denotes the filtering error for the time-reversed input process. 
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Markov Chain X (discrete-time) 



x 3 

X (continuous-time) 



A ~ U[0,1] 

H 1 1 1 1 H 



Fig. 7: Construction of the stationary continuous-time process X from the underlying discrete-time Markov Chain 
X. 



for some values of d. Note that the left and right sides of (92 1 are the mmse's with lookahead d associated with the 
original process, and its time reversed version, respectively. Thus, mutual information alone does not characterize 
these objects. 



A. Construction of a continuous -time process 

In this section, we construct a stationary continuous time process from a stationary discrete time process. This 
process will be the input to the continuous time Gaussian channel in ([T}. 

Let X = {^i}^t°^oo b e a discrete time stationary process following a certain law Pj. Let us define a piecewise 
constant continuous -time process X t such that 

X t =Xi (6(»-M (93) 

We now apply a random shift A ~ U[0, 1] to the {X t } process to transform the non-stationary continuous time 
process into a stationary one. Let us denote this stationary process by X. The process X is observed through the 
Gaussian channel in ([T]) at snr = 1, with Y denoting the channel output process. This procedure is illustrated in 
Fig.0 

1) Time Reversed Process: Consider the discrete-time process X", denoting the time reversed version of the 
stationary process X. The reversed process will, in general, not have the same law as the forward process (though 
it will of course inherit its stationarity from that of X). Let us construct an equivalent continuous time process 
corresponding to X^', using the procedure described above for the process X, and label the resulting stationary 
continuous-time process by ~SS R \ In the following example, we compare the minimum mean square errors with 
finite lookahead for the processes X and X^, for a certain underlying discrete-time Markov process. 
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B. Proof of Theorem 11 Examples and Discussions 

Define a stationary discrete time Markov chain X, on a finite alphabet X, characterized by the transition 
probability matrix V. 

The intuition behind the choice of alphabet and transition probabilities, is that we would like the Discrete Time 
Markov Chain (DTMC) to be highly predictable (under mean square error criterion) in the forward direction 
compared to the time reversed version of the chain. Note that the time reversed Markov chain will have the same 
stationary distribution, but different transition probabilities. We transform this discrete time Markov chain to a 
stationary continuous time process by the transformation described above. Because of the difference in the state 
predictability of the forward and reverse time processes in the DTMC, the continuous time process thus created 
will have predictability that behaves differently for the forward and the time reversed processes and, in turn, the 
MSE with finite lookahead will also be depend on whether the process or its time-reversed version is considered. 

1 ) Infinite SNR scenario: We now concentrate on the case when the signal-to-noise ratio of the channel is infinite. 
The input to the Gaussian channel is the continuous time process X, which is constructed based on X as described 
above. Let us consider negative lookahead d = —1. For infinite SNR, the filter will know what the underlying 
process is exactly, so that 

lmmse(-l,oo) = Wsa(X \YZ^) (94) 
= Var(X |JC^) (95) 
= Var(lol^-i), (96) 



where (96 1 follows from the Markovity of X. Note that the quantity in (96i is the prediction variance of the 
DTMC X. Let v : X — > K be any probability measure on the finite alphabet X and V[v] be the variance of this 
distribution. Let fi, P(Xi + i\Xi) denote respectively, the stationary distribution and the probability transition matrix 
of the Markov chain X. Then the prediction variance is given by 

Var(X |*-i) = »(x)V[P(-\x)]. (97) 

In the infinite SNR setting, it is straightforward to see that 

lmmse(d, oo) = 0, Vd > 0. (98) 

Since the process X is constructed by a uniformly random shift according to a U[0, 1] law, for each — 1 < d < 0, 
we have 

lmmse(d,oo) = \d\Vsa(X \X-i), -l<d<0. (99) 

For fixed d e [—1,0], with probability 1 — |ef|, the process Y£, will sample Xq in which case the resulting mean 
squared error is at infinite SNR. Alternately, with probability \d\ the error will be given by the prediction variance, 



i.e Var(Xo\X-i), which gives rise to (99 1. A similar analysis can be performed for the time- reversed DTMC X^. 
Having found analytic expressions for the mmse with lookahead for the infinite SNR case, we show a characteristic 
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plot of the same in Fig. [5] Note the difference in the curves for negative lookahead arises simply due to a difference 
in prediction variance for the forward and time-reversed DTMC's X and Xv*'. Since the mmse with lookahead 
are different for d < at infinite SNR, they would also be different at a large enough, finite signal-to-noise ratio. 



This completes the proof of Theorem 1 1 as stated for d < 0. 

To further argue the existence of processes where fmmse(<i, snr) are different, also for positive d, we provide the 
following explicit construction of the underlying processes. We then provide plots of the mmse with lookahead for 
this process, based on Markov Chain Monte Carlo simulations. The associated plots make it clear that lmmse(d, ■) 
are distinct for both processes when d is finite and non-zero. 

2) Simulations: Let X be a Discrete Time Markov Chain with the following specifications: The alphabet is 
X = {5,0, —5}. The probability transition matrix V is given by: 





0.6 


0.4 





V = 





0.2 


0.8 




0.875 





0.125 


where Vij — P(Xk+i — Xj\Xk = x i)- Note that 


x\ = 5, 


X2 


0,x 3 = 



chain, we can compute the prediction variance according to (97 i to obtain Var(Xo|-X_-i) = 6.6423. The stationary 
prior for this DTMC is /i = (0.5109,0.2555,0.2336). For the reversed process 'K t - R \ the probability transition 
matrix is given by 



0.6 
0.8 





0.2 



0.4 




^ 0.875 0.125 
and the prediction variance is 13.9234. 
We performed monte carlo simulations to obtain the MSE with finite lookahead (and lookback) for the forward and 
time reversed continuous-time processes thus formed. A brief description of the simulation is provided in Appendix 

3) Discussion: In Fig. [9] we present the MSE with finite lookahead and lookback for the continuous time process 
X and XW denoting the forward and time-reversed stationary noise-free processes, respectively. From Duncan's 
result, we know that the causal and the anti-causal errors must coincide. This is observed (and highlighted in Fig. 
10 1 by the same values for the MSE with lookahead for the forward and reversed processes. Indeed, for both 
positive and negative lookahead, the MSE's are different. 

Note that we know the asymptotic behavior of the MSE's with lookahead. As d — > — oo, the forward(and reverse) 
MSE will converge to the variance Var(Xo) of the corresponding underlying DTMC (similarly for the time-reversed 
chain). For d — > oo, the MSE's converge to the non-causal errors respectively (which are naturally equal for the 
forward and reversed processes). Note that the forward and reversed processes have the same mutual information 
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(lookahead) d 

Fig. 8: Minimum mean squared error with negative lookahead for SNR = oo for the processes X and X^ R '. 



rate function, as well as the same spectrum. This indicates, that such measures are not capable of characterizing 
the MMSE with finite lookahead. 

The above construction illustrates the complicated nature of lookahead in its role as a link between estimation 
and information for the Gaussian channel. While answering several important questions, it also raises new ones - 
Do there exist other informational measures of the input-output laws that are sufficient to characterize the estimation 
error with lookahead ? Such directions remain for future exploration. 

VIII. Conclusions 

This work can be viewed as a step towards understanding the role of lookahead in information and estimation. We 
investigate the benefit of finite lookahead in mean squared estimation under additive white Gaussian noise. We study 
the class of Ornstein-Uhlenbeck processes and explicitly characterize the dependence of squared estimation loss on 
lookahead and SNR. We extend this result to the class of processes that can be expressed as mixtures of Ornstein- 
Uhlenbeck processes, and use this characterization to present bounds on the finite lookahead MMSE for a class of 
Gaussian processes. We observe that Gaussian input processes with rational spectra have the finite lookahead MMSE 
converging exponentially rapidly from the causal, to the non-causal MMSE. We define and obtain relationships for 
the information utility of lookahead. We then present an expectation identity for a generalized observation model, 
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Finite Lookahead Estimation Loss 
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Lookahead d 

Fig. 9: Comparison of the Estimation Loss with finite lookahead for the Forward and Reverse processes, X and 
X( R ) respectively at SNR=1. For reference, the infinite SNR curves (dashed) are also shown. 



which presents the filtering error as a double integral, over lookahead and SNR - of the estimation error with finite 
lookahead. Finally, we illustrate through means of an example that the mutual information rate function does not 
uniquely characterize the behavior of estimation error with lookahead, except in the special cases of and infinite 
lookahead. Our example shows that the finite lookahead MMSE depends on features of the process that are not 
captured by the information rate function, or indeed, the entire spectrum. 



Appendix A 



In this appendix we prove Lemma [T] Employing the continuous-time Kalman-Bucy filtering framework (cf. 1 14 1 

for a detailed presentation), we obtain the following differential equation for the error covariance et = Var(Xt|Yo) 

de t 
~dt 



-2ae t - 1 e 2 t + /3 2 , 



(100) 



where e = Rx(0) = §-. We integrate over limits to d and re-arrange terms to get the desired result, 



d de t 
-2ae t - -fe 2 + f3 2 



dt 



d = 



2^ 



7/3* 



log 



-je d - a + ^at 2 + j(3 2 



2yV + 7/3' 



(101) 
log|p|, (102) 
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■ Reversed process 




-0.05 0.05 

Lookahead d 



0.2 



Fig. 10: Zoomed in version of Fig. j^j Note that curves coincide at d = 0, consistent with Duncan's result (91 



which upon simplification yields the following expression for 



a 

7 



7 I 2dy/a2+-fF 



p-1 



where p is as defined as 



-yR x (0) + ^Ja 2 + 7/3 2 + a 
7^x(0)- v / a 2 +7/? 2 + a 



(103) 



(104) 



Appendix B 

In this appendix, we illustrate the computation of the MMSE with finite lookahead for a Gaussian process, whose 
spectrum is an equal mixture of the spectra of two Ornstein-Ulhenbeck processes, parametrized by aj, i = 1, 2. In 



this computation, we use the equations developed in Section |IV-A| 

For simplicity, we will operate in the V domain for the spectra, where the region of convergence of the Laplace 



transform will be implicit. Recall from ( 14 1, that the spectrum of an Ornstein-Ulhenbeck process with parameter 
a is given by 

1 



S a (s) 



(105) 
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Thus, the spectrum of the input process X t , which is a mixture of two Ornstein-Ulhenbeck processes is given by 

1111 



Sx(a) 



2 oi{ 



2 oii 



(106) 



Under the observation model in ([T]i at a fixed signal-to-noise ratio snr > 0, the output spectrum is given by: 



S Y (s) = l + 



snr 



(107) 



LaJ — &2 — s 

Performing the Wiener-Hopf decomposition of the output spectrum in ( | 107 i, we obtain the following spectral factors: 

(s + pi)(s + pa) 



and 



S3 (a) 



(oti + s)(a 2 + s) ' 

(s ~Pi){s -p 2 ) 
(ai - s)(or 2 - s) ' 



where pi,p2 > are such that p? , i = 1, 2 are solutions of k(x) = 0, where 



k(x) — x 2 - (af + a 2 + snr)x + -^{pi{ + a 2 ) + ct\a\. 



(108) 



(109) 



(HO) 



Note that S Y (s) = Sy (— s). Invoking (50 1, the transfer function of the optimal non-causal filter H(-) is characterized 
by 

/smSx(s) 



Sy( S ) 



/snr / 2 



{a\ + a|) — y/snrs 



H(s) 



(HI) 

(112) 
(113) 

s — pi s — pi a.\ + s a 2 + s 
where ( |1 12[ ) follows from ( |106[ ) and ( |109[ ); and ( |1 1 3| > denotes the partial fraction decomposition of the expression 
in 



(s -pi)(s -p 2 )(ai + s)(a 2 + s) 
Ui u 2 vi v 2 



1 12 1. Having derived an exact expression for the Wiener filter, we are now in a position to compute the desired 



quantity, i.e. lmmse(<i, snr). From (56 1 we have, 



lmmse(<i, snr) = mmse(snr) 



h 2 {t)dt. 



From the classical result by Wiener p3| , we know that the mmse of a Gaussian process is given by 



mmse (snr) 



-co 1 + smSx(ju) 2tt 



(114) 



(115) 



Using (113 1 and ( 114 1, we are now in a position to explicitly compute the mmse with finite lookahead for the X 



process corrupted by the Gaussian channel: 



lmmsex(d, snr) 



mmse (snr) 



2/n 



-2pi d 



2i>2 



-2p 2 d , U!U 2 r -(pi+p 2 ) d 



P1+P2 



if d>0 



mmse (snr) + C + 



^_ 2 Ql d , v 2 D % 



2a 2 



a 2 d _|_ _viv 2 _ e (a 1 +a 2 ) d if d < 



(116) 



OLi-\-Gi2 



where 



2S 



Recall the rate of convergence of the mmse with lookahead, from causal to non-causal error, was defined as 

. lmmse(d, snr) — mmse(snr) 

Pd = j : - f r-- (118) 

cmmse(snr) — mmse (snr) 

For the Gaussian input process with spectrum as given in (JT06j, we can use ( 116 1 to compute the quantity pd'. 



Pd 



C 



"Lg-Zpi d + }h_ e -2 P2 d + u i u 2 c ~( Pl + P2 ) d 
2pi 2p 2 Pi + Pi 



(119) 



Clearly Pd^>0 for large d, in accordance with Lemma [9] 



Appendix C 



Let et = Var(Xt\YQ, Xq), and notice that e% satisfies the Kalman-Bucy differential equation ( 100 1, with the initial 
condition eg = 0. Applying a similar integration procedure to the one described above, we find that 



(id 



a y/a 2 + 7 /3 2 [ e^V^+^p - 1 
7 



7 



with 



We define the quantity, 



Equations (19i and (20 1 can be rewritten as 

T — OL 

cmmse(7) = , 

7 

Using the above relations, we have 

Var(X d |y o d ,X ) = 



Integrating, we obtain 

rt 



_ y/a 2 + 7/3 2 + a 
\J a 2 + 7/3 2 — a 

t = \J a 2 + 7/3 2 . 

-T-2 _ r 

mmse(7) 



2t7 



= cmmse(7) 



t + a 
2t ' 



a t f e 2dr (r + a) - (r - a) 
7 7 1 

cmmse(7) 



7 7 \e 2dT (T + a) + (r — a) 
2t ( t — a 



7 



(t + a) + (t — a) 



[ Var(X s \Y s ,X )ds = temmse( 7 ) 



2r 

7 



x 1 . fe 2Tt (T + a) + (T-a) 
f -27 l0g 27 



(120) 

(121) 

(122) 

(123) 

(124) 
(125) 

(126) 



Since the Ornstein-Uhlenbeck process is Markov, it satisfies "Vai(X t \Y^ 00 , Xq) = Var(X t |F t , Xq) for any t > 0. 



We may therefore use the relation in (70 1 in conjunction with ( |126[ ) to write an expression for the information 
utility of lookahead for this process: 

'e 2Tt {r + a) + (r - a) 



2U(t) = 2rt - log 



(1-e 



2r 

-2rtN T + Q ' 

J 2r 



(127) 
(128) 
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Since the Ornstein-Uhlenbeck process is also stationary and Gaussian, we may plug our expression for [/(•) into 
{71} and obtain the finite lookahead MMSE, 



lmmse(d,7) = cmmse( 7 )- e - 2t/( ' i) = cmmse( 7 ) ^e~ 2Td + (1 - e~ 2rd ) J (129) 
= e- 2Td cmmse(7) + (1 - e- 2Td )mmse( 7 ), (130) 
which recovers Lemma [4] for positive lookahead. 

Appendix D 

Let X be a stationary discrete time Markov chain, with known probability transition matrix V. Let li( 7 ) denote the 
noisy observation of X\ corrupted via independent Gaussian noise with the signal to noise ratio of the measurement 
being 7. I.e., 



Y i { 1 )= 1 X i + w!f\ (131) 



where {W. }j are independent standard Brownian motions, which are further independent of the Xj's. Let X 



denote the corresponding stationary continuous time process generated by the process described in Section VII-A 



and let Y denote the noisy process generated via the continuous time Gaussian channel ([TJ with input X, i.e. 

dY t = X t dt+ dW t , (132) 

where W. is a standard Brownian motion independent of the X process. 
Define 

h(7i,72) = Var(X |yri(l),y (7i),n(72)) (133) 

Let, as before 

lmmse(d) =Var(X |y^ oo ). (134) 

For < d < 1, note that 

lmmse(d) = f h(l,d- u)du+ f h(l + d - u, 0) du. (135) 
^0 Jd 

A. MCMC approach to estimating h{-, ■) 

Note that {-Xj,Yj},->i for a Hidden Markov process. By using state estimation for HMP's, the following 
computation can be performed for any i, 

x % = E[A > J |y 1 l - 1 (i),y l ( 7l ),y 4+1 ( 72 )] (136) 

Also, one can observe that 

1 M 

— ^(X i -X l ) 2 ^M7i,72), (137) 
»=i 
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as M — > oo. In the simulations, we chose M = 10000. For this value of M, the quantity in the l.h.s. of (137i 



was used to approximate h(-, •), which was then used in the expression for lmmse(<i) in ( 135 i to obtain the desired 
values of MSE with finite lookahead, via a monte carlo approach. 

Based on a similar approach, it is also possible to compute via MCMC, lmmse(d) for —1 < d < 0. 
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